-
-
Couldn't load subscription status.
- Fork 19.2k
TST/CLN: Add tests for __finalize__ with merge #62266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST/CLN: Add tests for __finalize__ with merge #62266
Conversation
…tadata Updated my contribution with the latest changes from the main pandas repo.
…tadata Updated pull request with latest changes from main Pandas repo.
…tadata Updated my pull request with changes from main Pandas repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. The OP and the whatsnew indicate that there is a code change occurring here, yet I do not see any. Only comments, docstrings, type-hints, and tests.
|
It appears to me that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
doc/source/whatsnew/v3.0.0.rst
Outdated
| - Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`) | ||
| - Bug in :meth:`concat` where concatenating DataFrame and Series with ``ignore_index = True`` drops the series name (:issue:`60723`, :issue:`56257`) | ||
| - Bug in :func:`melt` where calling with duplicate column names in ``id_vars`` raised a misleading ``AttributeError`` (:issue:`61475`) | ||
| - Bug in :meth:`DataFrame.merge` where the result of a merge does not contain any metadata or flag information from the inputs to the merge. (:issue:`28283`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, remove the note
pandas/core/reshape/merge.py
Outdated
|
|
||
| result = self._reindex_and_concat(join_index, left_indexer, right_indexer) | ||
|
|
||
| # Is this call to __finalize__ really necessary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I believe this can be removed.
pandas/core/reshape/merge.py
Outdated
| # __finalize is responsible for copying the metadata from the inputs to merge | ||
| # to the result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this comment, it does not add any information that isn't already documented in __finalize__ itself.
pandas/core/reshape/merge.py
Outdated
| Add one indicator column to each of the left and right inputs to a | ||
| merge operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: short summary should be 1 line. I think you can just remove to a merge operation
pandas/core/generic.py
Outdated
| ``attrs`` attribute, in which case, each such ``attrs`` instance | ||
| must be a dictionary that is equal to all of the others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By saying "must be a dictionary that is equal to all of the others", it sounds like that is a requirement for calling this function but that is not the case. Rather, it just won't propagate when this requirement is not met.
| """Check that DataFrame.merge correctly sets the allow_duplicate_labels flag | ||
| on its result. | ||
| The flag on the result should be set to true if and only if both arguments | ||
| to merge have their flags set to True. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove docstrings from tests, we generally do not add these. Comments are fine if they are adding something that is not already in the code itself.
| The flag on the result should be set to true if and only if both arguments | ||
| to merge have their flags set to True. | ||
| """ | ||
| # Arrange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove these arrange/act/assert comments.
| left.attrs = {"b": 3} | ||
| assert result.attrs == metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will check whether or not a deep copy is made. Recommend doing
assert result.attrs is not left.attrs
assert result.attrs["a"] is not left.attrs["a"]
instead. Also I recommend adding in right as well for good measure.
| no_metadata = pd.DataFrame({"test": [1]}) | ||
|
|
||
| has_metadata = pd.DataFrame({"test": [1]}) | ||
| has_metadata.attrs = {"a": 2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having module level variables, can you use indicator strings in the parametrization (e.g. "no_metadata", "has_metadata") and then do something like:
if left == "no_metadata":
left = pd.DataFrame(...)
else:
...| else: | ||
| return str(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like just a style preference? Can you revert this change.
|
The pre-commit checks prevent me from reverting the formatting changes you mentioned above. Specifically, the three lines of the form in |
I am able to revert the changes without problem. Ensure that you are not leaving a trailing comma on the line. Alternatively, I'd be happy to push a commit, but you need to check the box on the right side of the PR that allows maintainers to edit. |
|
I removed the trailing commas from the tests. Let me know if there's anything else you need from this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
Thanks @aijams! |
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.[Edited by @rhshadrach]
This PR changes the final call toDataFrame.__finalize__withinpandas.mergeto propagate the flags from both of its arguments and the metadata from its left argument.Improves docstrings and adds tests for
__finalize__with merge.